NPS ARCHIVE 


1963 
STONE, V. 


Dudley Knox Library, Nes 
Monterey, CA 93943 


ov. 


,£ 
= 


4 


-_ or Aas 


“ad 


> 


: 


i 


of) "i 


NAVAL POSTGRADUATE SCHOOL 


Monterey, California 


ga a re 9 


DESIGN OF RELATIONAL DATABASE BENCHMARKS 
by 
Vincent Courtney Stone 


June 1983 


Thesis Advisor: Dawiids Ke a niss-ao 


Approved for public release; distribution unlimited 


Stora 
-euaitzonae SEANATSO raserektaa, a0 meen | 


va 


\ 


e0032 Yahizved: snssntv 
ia : 
x) FPR wept | a 2 bay, 
ie 
} iy 


5 ee - a “veer 


= k 
9h268 42 Wed Sore sy) Oe ue | ‘ 
—- ~ . : : ; 
Betiation ecleyiidsarh ;Scocles sifdug: Set he veteee. i 


21 01ST : 


Dudley Knox Library, NPS 


Monterey, CA 95945 
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) ’ 
READ INSTRUCTIONS 
ape "aan thet eR rg TR 


4. TITLE (and Sudtitie) 5. TYPE OF REPORT & PERIOD COVERED 
Master's Thesis 
June,. 1983 


6. PERFORMING ORG. REPORT NUMBER 


Design of Relational Database Benchmarks 


8. CONTRACT OR GRANT NUMBER(s) 


7. AUTHOR(@) 
Vincent Courtney Stone 


NT, PROJ 
\T NUMB 


10. PROGRA 
AREA & 


ELE 
ORK 


9. PERFORMING ORGANIZATION NAME ANO AOORESS 
Naval Postgraduate School 
Monterey, California 93940 


M ME ECT, TASK 
Ww UN ERS 


12, REPORT DATE 


CONTROLLING OFFICE NAME AND ADORESS 


Naval Postgraduate School June 1983 
Monterey, California 93940 13. NUMBER OF PAGES 
waz 


» MONITORING AGENCY NAME & AODRESS(if different from Controlling Office) 1S. SECURITY CLASS. (of thia report) 


UNCLASSIFIED 
SCHEDULE 


Approved for public release; distribution unlimited 


» DISTRIBUTION STATEMENT (of this Report) 


- DISTRIBUTION STATEMENT (of the abstract entered in Block 20, If different from Report) 


- SUPPLEMENTARY NOTES 


19. KEY WOROS (Continue on reveree side if neceeeary and identify by block number) 


benchmarking, database machines 


20. ABSTRACT (Continue on reverse aide if necessary and identify by block number) 


Performance measurements of a database machine reflect not only 
the processing power of the machine, but also the size and 
structure of the database. It is therefore useful to construct 
databases for performance measurements of database machines. 
Purthernore, it ts wseful. to utilize synthetic data, such that 
the volume of the reply can be predicted for a given query and the 
structure and attributes of the database can be varied (Continued) 


DD anos 1473 Edition oF | Nov 65 1s OBSOLETE 


RA SRD ZL 814-6601 1 secuRITY CLASSIFICATION OF THIS PAGE (When Dale Enterec 


AL TR en - — - 7 
* Linings keer Se 1 Fateh ¥" wale wenevoe 1 ee 


eoetioen sas dbo et , a ere 128, 


== a relean Goeth <6 Vivenhl kom embepeas MOMS ambbeet & ~ 
ein - 5. (4ar -3san seadeseb « Yo Ad eaesVEeRee i) bee a4 
bie onto 44 ei hud ,@@@cape sfz }o iewng Batepepete odd 
7. fulan > te a3 i sf 42 save -sas 40° B99 F0u5s IE 
. 7 / ) eineaetvests Sonenvuteeqg Se} dihads sab] 
vals eye eeitivu‘os Lvisea e¥ gl ,Seeeteds «2% 
ed? tea ¥ r eSiotbetqg ed gas ylges edd 2o edalov +47 
{' benweldno. fav saednieb ofy Yo eascedtagaa. bap stu. >vuz9s 


xiv sedi der vow se wort ET i, Go 


ual sal aa ERA a Tet he Cr ee hae 16-42 018 ME 


lS 
SECURITY CLASSIFICATION OF THIS PAGE (When Data Entered) 


ABSTRACT (Continued) Block # 20 


for intended test queries. Conducting measurement studies using 

a synthetic database contributes to the generality of the results 
when different test queries are employed. A parameterized program 
is described herein which can be used to generate various relat- 
ions for a synthetic database. The experiences in constructing 
and using the database generator are described. It is suggested 
that given sufficient information on real-world databases the 
generator may be useful for modeling them as well as for creating 
databases for benchmark tests. 


S/N 0102- LF- 014-6601 Ta 


ee EEEEEEEEEaEEEEEEEE SEE? 
SECURITY CLASSIFICATION OF THIS PAGE(When Date Entered) 


at 
iJ 


= F wsesbh ue, “ shoae fl ib. 
ig! bitece-Ieet fa wéltreria?nt 2. sok: obd he 
sian wets saileboa va the 


eneeatieatimnentn , -) 
. 7, ; ; 


ee ee Agere AP — 


Approved for public release, distribution unlimited. 


Design of 
Relational Database Pencnmarss 
by 
Vincent Courtney Stone 


Lieutenant Commander, United States Navy 
B.S., United States Naval Academy, 1974 


Submitted in partial fudtTiiinenat of tre 
requirements for tne degree ofr 


MASTER OF SCIENCE IN COMPUTER SCIENCE 
from tne 


NAVAL POSTGRADUATE SCHOOL 
June, 1983 


ABSTRACT 


Performance measurements of a database machine reflect 
not only the processing power of tne machine, but also the 
siz and structure of the database. It is therefore useful 
to construct databases for performance measurements of data- 
bDas2 macnines. Furthermore, it is useful EO Wewlaze 
Syntnetic data, sucn tmat tne volume of tne reply can be 
predicted for a given query and the Structure and attributes 
of the database can be varied for intended test queries. 
Conductine measurement Studies using a Syntnetic database 
contributes to the generality of the results wnen different 
test queries are employed. A parameterized prdgram Is “ace= 
scribed nerein which can be used to generate various 
relations for a Synthetic database. The expertences in con= 
structing and using the database generator are déscrived It 
MS Sureested that given surricient information on real-world 
databases the generator may be useful for modeling them as 


Welt asSP Por creating catabases* for benchmark Tests. 
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IT, BENCEMARKS FOR DATABASE MACHINES 


A. PERFORMANCES MEASUREMENTS 
In conparing database management systems ({TEMSs) an 
important factor is their performance. One way to compare 


DerSS. 4 15: 9 60» TUM specific. eppilicati onscameder ailverdiety “of 


systems. Bacn system can be “fine-tuned” to give the dest 
result. An evaluation based on such; a methodris costuy) and 
time-consuming. Often such a method may be inteasindle. In 


Many .cases,,. a database for, the specific applications may 
Bot.even exist. ASea secong metnod,.@an evaluation could be 
made on tne vasis of performance measurements of existing 
databases. TALS, metnod,is-dessrcostiytand. Lessntime=conusa= 
ming. However, tne following questions arise. Is the 
existing. database, sufficient: to .support:intenaed applica= 
tions? Are the applications good for condu ting relative 
performance evaluation of different DBMSs? 

its 1S .impractical -Loepenformcsuecm DEFeECt companrsonseor 
DENSSaosAdapting an.application touseverablisystems "foerceval= 
vation, —punposes .-1S . nOtiespractieal < Evaluation based on 
existing databases is Subject to interpretation error. The 
increasing numpner of DBMSs makes it imperative that some 
method is to oe devised to do coOmparative performance 


measurements. 
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B. BENCHMARKING 


in 


Fre couceptsof avystandard«sfor«measuring performance rsi 
NOt’ néw. The standard is usually Known as a bencnmark, 
after the markers used by surveyors in estabdl Sning a common 
reference point for their measurements. For exampl?2, Mount 
Diablo (a mountain east ot San Francisco) is used as the 
ref2rence point in surveying mucn of Nortnern California due 
to, Al tseclene-ranee’ visi bi Lity: Anenepnodrefor eméasurine 
Similar items in reference WAS alrstandardisis cabhbed 
henchmarxing. 

Precedents for pencnmarking exist in measuring the per- 
formance of computer systems. Bae oeGibpsensMix metned 
measures the execution time of a specific set of application 
pregrams for bencnmarking computer systems. The expected 
performance of a system could be computed by characterizing 
the expected workload aS a mix of jovs from the standara 
Set. 

It is proposed that a set of application programs can be 
devised to measure the performance of DEMSs. Psings opaese 
benchmark measurements, it will ve possible to compare tne 
performance of various UBMSs. THenmeasurementsrcanepe ana- 


lyzed to Sugeest Strenetns and weacxnesses of tne DEMSs. 


C. QUANTITIES TO BE MEASURED 
Tne generally accepted performance index tor a DEMS is 


the response time. Defining tne response time as the 
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primary performance index is the scope of this research. 
However, the respons? time is based on several factors. 
emene these factors are tne time to process tne query, tne 
mrme to access the data, the time to process data, and tne 
time to return tne date. For a DBMS running on a maintrame 
computer, the effects of other workload on the response 
time must also be considered. 

A measurement of the response time is more significant 
when measurements of its components are provided. Some 
Simplifying asSumptions may be made. The first such assump- 
tion is that tne rate of accessing data in the database is 
constant. Phe=nsecond is thal itwenaratrere? rrenurnigee »yproq 
eessed data is constant. However, the time involved in the 
Precessing ‘of queries and theetime-invodvedo in,» tae: »proces- 
Sine of data may vary greatly among datadase operations. It 
order to recori the variance of time among tne operations, 
tests must be devised wnicn will indicate tnese components 
tor all supported operations. 

This thesis focus2S On measurements of the response 
time. A development of a system to measure components of 
coe "response time is discussed. Tne system involves the 
rpeneration of a synthetic database. The system also méa- 


Sures the poenchnarked macnine in using that database. 
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ge BENCHMARKING RELATIONAL DATABASE MACHID 


A. THE BENCHMARKING ENVIRONMENT 

The research done in Support of this thesis has been 
perforned in a complex environment. Tne complexity involves 
multiple macnines and multiple operating systems. 

A Relation Generator (RG) of synthetic relations has 
been developed uSing Pascal (i.9., IM’sS Pascal/VS) in a 
multiuser environment (VM/CMS running on IBM 3453S). RE. is 
us2di in a Daten environment (MVS) on the same macnine. The 
relations are generated in EBCDIC-cnaracter form. They are 
transported to a UNIVAC 1192 via tape. Tne EBCDIC files are 
tnen loaded onto tne nost (i.e., tne UNIVAC computer) and 
translated by the host into ASCII files. Tnese ASCII tiles 
are finally loaded into a backend database macnine ([(i.6., 
Britton Lee’s IDM 50). 

Tne backend macnine and interface software for tne 112% 
series computers are marketed by the Amperit Corporation of 
CHatsworings -California,e: asa tne: RDMs 119¢. Additional mea- 
SurementsS can be made by bypassing the part of the query 
processor that provides terminal support. Thlseis*raccomn— 
plished . by comnunicatine directly with the query “processor 
via compiled language statements (i.e., COBOL). This does 
not completely bypass the query processor, because the query 


language is interpreted and cannot be precompiled. However, 
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tne results snow that query processing do€s not represent a 
Significant portion of the response time it the nost works- 
load is lignt. Tne terminal nandler represents aiso a small 
portion of the response time. Therefore, the only advantage 
to the use of compiled programs is the option of running the 


process as a background job. 


B. THE ARCHITECTURE OF THE SYSTEM 

Tne arcnitecture of tne system encompasses two major 
areas. The first of these areas 15 the internal architec 
ture of tne IDM 522. The second area is the nost system 
software, i.@., the user interface which runs on the host. 


1. Tne Basic Machine Arcnitecture and Various Contigu- 
rations 


The IDM 508 is made up of several modules connected 


to a common nigh-speed bus (See Figure 1). Tne database 


processor is a 6-mnz, Zilog 7Z-8¢24 series microprocessor 


which performs the DBMS functions. The! @odine “Yor “tr 


(D 


Mmacroprocessor is written largely-—in—toe_C programming lan- 
esuazge, along with some assembly language routines. Lt 
comprises about 350 k-bytes of machine code. An. optional 
module, tne database accelerator improves tne system perfor- 
mance by implementing in hien-speed, Special~purpose 
hardware sone of tne DBMS functions normally performed . pv 


the database processor. 
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Figure 1 - The IDM Bus Architecture 
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The cache memory is composed of 64k—-pit dynamic a4™ 
Ghips. Tne basic configuration (at tne peginninge of tne 
tests) includei one-nalf megabyte of memory. Up. vo. ‘siz 
megabytes of nemory can be supported. During tne testing 
period, configurations of on@ and two megabytes nave also 
been used. 

One to Your -diSk controllers may pe ilastallied. 
Fach controller Supports up to four six=nundred-megabyte, 
hard disks. A tape controlier may be installed to facili= 
tate backing up and loading data. 

Two ‘Standard fost interfaces ere available. A 
TREE=-<4E2 byte-wide parallel interface is available for con- 
nection to mainframes and minicomputers. A second interface 
Gan. be used to provide multiple BS—-252 serial ports to 
microcomputers. A special obyte/word interface tor com- 
Munication with UNIVAC host computers is supplied by the 
Amperit Corporation. 

2. Tne Database Organization 

The IDM 50¢ sottware supports the relational data- 
base: © mode 1 . Data “irs pistored tomate cdisic.imertwo “ghowmicatl 
levels. Tnese levels are tne system datebase and tne user 
databases. At the top level, the system database contains 
five system tables and thirteen database tabvles., The systen 
fan leisiscon tain cbnformetion on nardware configuration, date- 
bas2?s and current uSage. The thirteen ‘datarase tadles 


comprise tne data (udaet a onary: Tney are used to 
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Store, information: about relations’, 5 avtrionutes,’ ' users’) and 
security. A list of the system tadles and the database 
tables is given in Appendix A. 

Althougnm access to tne system database is required 
tor. the, creation: of. a user, database, can existing userodata= 
base cap. be accessed. directly, )Lis&s,s, «without going througn 
the system database. Fach user database nas doth datapase 
pemees , andy user teoles« The datapase tables are «stored 
within the . user database and may, be accessed in tne same 
manner as user tables. 

roe, basic unit of disk @ccess iS a 2k~byte -bioex. 
When a database is created, a space ailocation in pilocxs may 
be requested. This. allocation may be increased.if »sneces- 
sary. Both system tables and database tanles are used by 


the system to compute pnysical addresses. 


3. Tne User Interface 

Tne. user interface is accessed by invoking an pro- 
eess on the host. Tnis process is an interactive query 
processor. Tne query processor parses tne user’s queries 
written in the Relational Query Laneuage (ROL). Eo is 
Amperif’s implementation of Britton-Lee’s Intelligent Query 
Language (IOL). Alternatively, queries may ce submitted to 
the query processor from a compiled COBOL or FORTRAN pro- 
gram. Suomitting a compiled program as a batch job, the 


us2r may bypass the query processor’s terminal sanaler. 
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However, the bdatcnh job still depends on the query processor 
for parsing of the auery. 

The Relational Query Language (ROL) provides opera- 
frons and facilities simiter’ to tnose available on 
relational DEMSsS currently running on maintrare computers 
and larger ninicomputers. ROL also allows queries to be 
pr2-parsed and stored within a database. These stored com- 
mands limit the time reyuired in the host for parsing and 
reduce the time required in the backend tor the database- 
table lookup. Additional information on RQL may ce found in 
fe, 2 and Jib. 

Communication with the IDM iS via a SyStem process, 
RDMIO. RDMIO supervises communications pvetween user proces- 
ses running on the nost and the hardware interface to tne 
pe (See Figure 2). Up to ten users may access tne RIM 


simultaneously from a single UNIVAC nost. 
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Flgure 2 - The IDM/User Interface 
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III. THE BENCRMARKING APPROACH 


A. A MULTI-DIMENSIONAL PROELEM 

Creatine a obencnmarking system poses a problem with 
several dimensions. The problem can be broxen down into two 
major areas. Tnese areas are modeling and measurement. 

Tne modeling problems can be categorized as DsMS- 
dependent and database-dependent. The DEMS-dependent 
moieling problems are related to DBMS scnema and syntax. 
The database-dependent problems are related to tne charac 
teristics of the database and tne application to be mocelecd. 

a. DBMS—tependent Problems 

Tne three widely xnown database models are the 
Deerarcnical, the network, and the relational. It fas been 
Shown that databases and applications based on one of tnese 
models can be translated to any other model. However, there 
is no accepted basis for meaningful comparisons of tneir 
performance measurement. As a fiTst. step, tests have been 
MeTLOrmed in support for establishing such a basis for DEMSSs 
having the same underlying model, specifically the relation- 
al nodel. 

db. Database-dependent Problems 

Tne database-cdependent problems are reépresenta- 


tiv]s of existing databases and the applications which are 
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us@éd on tnem. Existing databases vary in tne complexity and 
in the efficiency in which they nave been implemented. 
Tyese varieties are partiy due€ to the physical data tnat are 
represented in the database and partly due to the ovproegram- 
mers” abilities to construct tne database. Additionally, 
the applications which uSe theSe databases also model thre 
Piysscal deta represented as well as the informetion te- 
quired of tre database. Thus, both exiStine databases and 
applications must be modeled. Tne key to an effective and 
general model is creating one wnicn represents common  cna- 
fracteristics. The eharacteristics or databases and 
applications nust be carwfully stucied prior to the design 
of a general and eftective model. The contrasting nature of 
existing databases and tneir applications present an eéx- 
tremely conplex modeling probiem. 
2. Measurement Preblems 

DBMS benchmark measurements, as a standard, may aiso 
represent a comparison ot DEMS performance. This standard 
may be Sitner ansolute or relative. Absolute measuremerts 
assume a fixed standard. Relative neasurements may provide 
rankings witnin a group of DBMSs. Tne measurement ot the 
Pesponse tine for,reiative: ranking, is,ourggoal. 

Experiments< must) DeeconstPuctedecarefuliy. | andstthe 
environment must be controlled to provide useable,  eccvrate 


measurements. For example, in pertorming researen tor this 
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ies 2S: 1 Tt. | omase (been: mo taiceds pnat the load, on, tne. nast. . can 
Sienificantly affect the response time as Seen by the user. 
Similarly, tne response time is heavily atfected by tne time 
Peguired to - Sturn ¢ the Wate to the wser | 2tnocthe -eécreem 
These effects must be minimized in order to odtair measure- 
ments wnicn more accurately retiect tne pertormance ot tne 
backend database machine. Resolution of measurement pro- 


plrems is discussed in Section V.B- 


B. RESOLVING THE MODELING PROBLEMS 

Altnougn the modeling problems cannot te eliminated, 
Steps can be taken to ninimize the errors introduced by tne 
modeling process. 

1. DBMS-dependent Problems 

Two asSumptions can be made to minimize tne DEMS- 
dependent modeling errors. Tnese assumptions concerr the 
format of the data and the operations used to access tne 
data. 

Mme “Tirst assumption iS that all. relations. rare 
stored in tnird normal form (3NF). Tne use of 3NF minimizes 
the possibility of inconsistent data. While real databases 
do not use SNF, this fact doesn’t discourage our assumpticn. 


The benchmark iS designed to provide a measurement of LTEMSs 


performance. It 18. fet intended to take into consideration 
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the abilities of those persons who will design tne datanases 
(althougn ease ofr us? may de a consideration in sore 
instances), for they may not understand the tneory cfr SNF. 

The second assumption to pe made is tnat tne yuery 
languages used by tne DBMSsS are logically eyuivalent. Al= 
tnougn differences in syntax do exist, tney gené€rally do not 
Sreeet the preadth of available operations. Therefore, 432 
cComnmon set of queries can be implemented in. the DLEMss 
individual syntaxes and provide tn2 identical logical re- 
sult. Any variations to this should be noted with benchmark 
results. The basic set of experiments include selections, 
projections, joins, updates, insertions and deletions. Ad- 
ditionally experiments snould be performed which test the 
performance of any peculiar or powerful operations which a 
DBMS may have in addition to the standard set. 

2. Patabase-dependent Propiems 

Toe eliminetion of database-dependent modeling pro- 
blems involves two fundamental areas. The first of ~thess 
areas is the feneration of 2 Syntnetic database. Tne gene— 
ration -of such a datatase allows the use of data which is 
Peneraily representative “of erlstin databases, but not 
specifically representative of any one. The design of the 
syntnetic databas®“s cnaracteristics snould pe broad. This 
Snsures >) thet? it can. He adapted to’ realistically measure tie 
perrormance- of a database with it's” own’ characteristics. 


These characteristics incluae tne sizes of tne relations (in 
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tne number of tuples and tuple lengtn) in tne datarése ana 
Tae Length,.of a tuple.relative to bDlack size of the storages 
medium. 

The second area involving database dependency 
involves tne applications running on tne database. A syn- 
thetic workload is required for the same reascns as tor. the 
syntnetic datapase. The Ce&sien ar the Syntnetic workload 
should be broad enough to provide enough results to be abie 
ro ftuily simulate, different. applications. THe workload is 
designed witn two major considerations. The tirst conside= 
ration is support .of the,. basic. relational operaticns 
discussed .previously. An, additional, consideration  tares 
Mitdes. account the varyine access patterns of existine data- 
bases. For example, a given application may reveatedly 
reizaese. only one. tuple at a time. Anotner will retrieve 
many in one operation. An important, ciaracteristic is tee 
Locality of. the data retrieved by operations. Tnis charac 
teristic —may. produce different levels of performance with 


different indexing netnods. 


C. THE SYNTHESIZED DATABASE AND WORKIOAD 
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Ps. Necessary tO obtein the SEt which can be used on da” wide 
mance of DEMSs. It: 2s also important ttat tnis set dees aot 


favor any DEMS or class of JBMSs. 
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Two approaches could nave been taker in octtaining 
neasurements. One approacn would be to pertorm tests on 
existing databases, The @Cther approach is to do reasure- 
ments on a synthstic database. The latter allows tne 
preates, tlexitility in performing operations on the deta- 
base. This is because tne scnema of a real database might 
NOt provide a suitable Structure for performing @ test of 
some operations. Tne schema of a synthetic database, on the 
Other hand Tinimizes any bias resulting from designing tne 
tests around a particular database. 

je research tor this taesis 1s performed in <¢on— 
junction with evaluation of relational database macnines. 
However, tne installation nas no relational databases. 
Therefore, any tests on the DBMS would nave to be pertormed 
on 2itner a synthetic database or a database converted from 
anotner model. Since the use of Synthetic databases sup- 
ports amore general approacn in bencnmarking, the cheics 
has been made to generate sucn datapases for pencnmarxineg 
TESTS « 

2. Types of Synthesized Data 
Syntnesized data snould nave one major characterist- 


Gs tre types of date should be Droad EnouEen to Test tHe 
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supported - DEMS operations of different types. of ftelds 
(i.e., values). For example, in tne researcn performed for 


this thesis, the first two attribute values of eacn relation 
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nave tne same numeric value. However, the first attribute 
value is stored aS an integer and the second aS a character 
string. One set of tests selects tuples based on tne inte- 
wer values; a second set of tests selects tne same uples 
based on Character values. Response times may be affected 
By the processing differences rélated to tne data types. 
Additional differences may result from tne time required tc 
eormay the data for output. 

S. General Schema of Synthesizea Data JU 

The synthesized data used for tnis thesis has four 
feus Or Teletions. Eath set nas several relations with 
different numbers of tuples. Sach relation in a set has the 
same attributes. The attricutes are similar amoung the four 
Seis, Oirrering only in number and length in order ‘to pro- 
fige a fange of tuple lengtns. Table 1 snows tne range of 
tuple characteristics. 

The relations are stored in several databases. Two 
databases are used for testing single-relation operations. 
te. First database contains all of the relations ased” in 
Single relation testing. The second database contains relée- 
tions wnose tuples are of 10@ bytes and c2@@ bytes. THis 
database uses compressed fields tor strings (i.e€., trailing 
blanks are dropped) . Several databases are used t» provide 
relations for testing join operations. For testing, it is 
desirable to Spread the join operations over the two disks 


20 ne system. A full implementation cf this desir 
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Table l. 


Tuple Lenetns 


Relation Lovical Sizes 


Relation Physical Sizes 


Attributes 


Attribute Types 


The Relation Characteristics 


120, 202, 140U¥, 2000 Bytes 


5@0, 1000, 2508, S¢@e, 1¢"%20 Tuples 


5@ kilobytes to 20 megapnytes 


14 ( for 10@ byte tuples), 24 (for 
otner tuple lengtns) 


Sequential Integer, Random Integer, 
Collated Alpnanumeric, Blocks Sets 
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@atabase placement iS not possibile, because the storage 
allocation algoritnoms prevent us from controlling over the 


srorage joecation of specific relations. 
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IV. GENERATING SYNTHESIZED DATA 


A. A PARAMETERIZED RELATION GENERATOR 

The Relation Generator (RG) is a parameterized program 
fOr generating relations for a database. RG prompts the 
weer Concerning the characteristics of a relation. Fires 
ane user is instructed to enter the relation name and size 
(i.2., the number ot tuples). Then, the program requests 
data about each attribute. The data equesteac includes 
attribute name, value type fi.e., integer, String, etc.) and 
distribution of the attribute values. Tne relations gene- 
Paved are stored in ASCII files to simplify transfer between 
systems. 

1. Capabilities 

RG contains routines to generate seuuential numoers, 

random numbers (eitner uniquely or nonuniyuely), and cna- 
racter strings in collated oraer (See Appendix B). The user 
may. Giso specify @a-file wnich contains a set of values for 


an attribute to be used in generating attribute values. 


fb 


This set is called a “value-set” and the file is’ called 
value=set file. It-.1S produced) ty the wtility proerans, 
Value-set Generator (described below). Tne actual range ofr 
watues- —Tron tue file to ibe used fornan. attripute ps “ecaited 
the attribute’s domain. The user specities the number of 


values from tne value-set to be included in tne attributes 


Ar) 


me 


Fe a a _ Sk 7, 
re wot rt 18.0 us yee a! 
—_.” . 

Bteq 6. ‘on 
* Suitirw era F 
teins = to zat? oreei-teede FF ae e020 


om st Petree “3 18708 on seaqute’ ob ab tee 


— ee ae 


2ahe8 


9 


re roses re 
10% cantaley 


“oe 


} _— 
7) _ .) 7 fl 
_ ) SaewaupeS nbtacse Ans cant te ae 91 fe a9) 
ins ad ad a : sae” : rae | ve 
- =. seresiys Sv Ab. 2ht eles sae Jods 
7 nd f . 
_* ea seerize «teserc1 PS “a9 ouLee verene feats 
=m >  Shadesier sat. (2a lay srontints ¢ a Yovno 9am . 
aoe S€¥seen2 ythisete 2! 29h DR, ieee a betet2 ets be 
ca els 7 n : - a - 7 
a? 7 P| 
¥ 
| ‘tile of 
(estheurse *;arsqes a2 2epireon zolerans os 7 ; 


i 
ee>. SHE . fvisuvs.aur 
- 
} = 
acu S67 5 +) Z2£2i0080¢ 


z yreite> 


5 
Ps 
il 
. 
i 
th 
al 


"SSS P18 PV eLIIY va 
r ~- 
* ¥ «, ial 
a a ee - 
: 
481 } e 0 
! 
b c= al 
* 
* ) bi. 
pa ape 
‘ 
- 
. 
1 
- 4 


ty etareneni $4 en ‘aa* ae 


DOR ei Ja. 
' _- : c 
“ G. peg isiteuy te 
“ot paagh'ee,er- eve 
1 > 
122i) srt eTLAROL 
\ c : ’ 
4A . 
The g 2 o27°7 4D ec? is 


1 vs snp free venga): ergener n05; 


- 
Teniy- patsb ios a apelaee +94, 


- 
oF tee ; 


@ 


7HOD SLD alg = xais "2 oaks, 


© 


a 
erud soe 


‘Joersutev” 2 sales" &- 342) 


domain. It iS not necessary that the domain contain all the 
values in the value-set. RG reyuires tne user to define the 
SiStribution of tne attribute velues. THe aisStribution is 
either in discrete blocks or random of both. A discrete 
SaStridution in which tne attribute values are randomly dis- 
tributed may be created by Sorting a relation containing 
@iscrete blocks on a random number attritute. 
2. Ine Development 
ae The Developmert Snvironment 

RG «US written in 2BM Fascal/V¥s, running “wader 
the VM/CMS operating system. VM/CMS is an interactive, 
multiuser operating system. Beacause of operating systen 
limitations, RG has been converted to a MVS (batcn) process. 
Standard Pascal syntax nas been utilized as mucn as possi- 
ble. Pascal/VS extensions to tne language nave been used. 
Agaitienaliy, some of tne file descriptor information is 
Speecaric to the operating systems. 

db. The Development Process 

The first Step in ars development ot tne system 
is -tne drafting of a modular framework. Persons are then 
assigned to develop tne different nodules of the program. 
Mee -Gitrerent todules include: the mdin program, the main 
generator module and the individual value-type ygeneratcr 
modules. The individual modules produce” specitic types of 


Yasues. for .the attributes. 


Raa? 
neh er 7 
wes - plans a ne 


-_ « 


ne aan ‘ts dase | ay a it 
Paeina siud Peers ont opt ie 
doariees a aniztes vo * anasend od 6 


=. 5 ra basse daddyle. neaEet st Tt 


7 ~~. 


- : ca 7 


wa > 
: Sa, of Sa 
cm i i a ae 


i a 
| ioe yeanles eval — 


a wiliones PRL Les £e5, Med 2 trite ar an > 
*, - are 
naibananate oe eh Bey 7Erete abrigilieal Mg oh 


nays ailies ano ‘6 Welln sed Loa} aye pip eliege - vaeute 
| a tapas) SU's of nerrepnoy gee zag 84 jared 831 
oe 


hy 


ee a5 noty 28 pazeitru'nssd eg ue ispeet otshoe 


a aS87 Svea ieedarted say og 7 fle nates ANissesd 
, 36" Snipe autiendats £8: 
| . |  HeTEN <atissany ms e7 ott 9 
_ ~~. : >. | 2230074 jridngb tae aT 


» 
vu 
a 
la 
re 
ha 
Le 
| 
od 
& 
~~ 
~ 
, 
=a 
IS 
| 
¢? 
| 
(bh 
-_ 
~~. 
Ed 
i 
a 
44 


- 5 7 - Le © a un ; 
~ Pease Say tc ycrarecisvsf any FP eave, 122s ‘ont : iD 
- ” . . { . ~, 7 
_ MBS Biacencetc® 3 § .stowenitd gales te sol? teth sag om 
(a 7 7 
i pt tT) ie - 
a -TRtROM, Shr to ‘alurom jee eRehe eee oe piaren C97. ansiine 
clam S27 »merss79 7 Sad obiew? Leitben Feaiey ve 
“ i i . a 
TaVe+60S% ShXi—-enisv “lava isn? ge% bee elbdsom 700 sn 89 
{ i j 7 . 
4 a . oy a _ os - =~ ay 7 : od m ‘ 
: %6 2 r or icoqe 22vfc1qg e3iubem feotkelont set  vatetow 
. ‘ <2OG 79 2TI38 SEP Io, 
; ' 
vf. 
f 4 - 
’ i Ww 7 
‘ | ey 


The system has been developed using modern soft- 
ware engineering techniques. The different mocules nave 
been debugged separately. Program narnesses, wnicn contain 
no logic except to invoke a procedure, Nave beer used tc 
test procedures and Subprocedures. Modure Stues., whick 
Simulate the actions usually performed by procedures, nave 
peeén used in place of the procedures to test the main pro= 
gram and tne main generator module. Once dehugged, tne 
moaguies bave been integrated with tne main program. 

The responsibility for generating relations nas 
be2®n assigned to one person. Additional development otf the 
system involved several items in addition to déebuggine. A 
Uiaeity tO @eneTate value=set files has also been created. 
Thus, the otner members of the team nave heen freec to work 
On other phases of the project. 

e. Design Probviems 

Two major problems have been encountered in ths 
pr@paration. of RG. fhe first provlem is the size-of tne 
relations to be generated. In the original RG desien, all 
proine tinged lists of attripute values resiae in the prime- 
ry memory sinultaneously. THE size of the Pareest, relerion 
that has heen generated is twenty megabytes. Pris” Teguire's 
twenty megabytes of tne virtual memory space just to store 
the contents of the lists. Additional space would be -re=- 
guired for the program and the overhead associated with 


ifjuked “ists (l.e., .poinmters to merory Locations). TAs 
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exceeds the virtual memory space available to a single user 
under YM/CMS. 

This problem nas peen partially solved py acces- 
Sane Sequential files as a substitute for tne linked Jists.. 
Therefore only one list of attribute values at a time is 
evored in the primery nemory. However, -@ Jipked 4250 oF 
some of the longer attributes generated reyuires over two 
meeabytes of memory just for tne date, without ‘considering 
the space required for pointers. 

Tne second problem concerns the transportation 
of the files of generated relations to another System: 
Under the VM/CMS system at the Naval Postgraduate Scnool, 
Seca user is dllowed°a Iimited amount of file space. LEAS 
amount is much too small to nold nost of tne relatiors gene- 
rated. Additional space is available on a temporary (i.e., 
One-day) basis. Also” important "isthe Fact* "ttat- waite 
VM/CMS files can be offloaded to tape, tney are stored on 
tape in a non-standard format. THerevisS noIMmMtiLlitysproeram 
bo transfer’ VM/CMS files ’to tape in*standerd format: There 
i e159 NOtTRtility” proeram’ te exrchaneetfiles? tetweenys tHe 
Gapes of YM/CMS “format and* the’ tapes of MVS" rormat. 

It is apparent that VM/CMS is not” the ideals en= 
vironment in which to run the system. Therefore,” © 1128 fas 
he@n necessary to convert the system to run in the “VS en- 


vironment. The MVS SyStem writes tapes in tne standard 
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format. It also allows the uSer to have a much larger 


virtual memory space. in retrospect,  1t makes: sen 


n 


e. 10 
develop the system in an interactive system (i.e., YM/CMS). 
Fast turnaround contributes to faster program developrent, 


and tne interactive environment maxes cdebvgeging easier. 


B. A MATRIX OF RELATIONS 
ime relations generated by RG are desiened to. ‘<upport 
experiments over‘a ravge of relation sizes “dnd cneracteris= 
tics. These sizes and characteristics are selected to allow 
Maximum flexibility in pursuing experiments with a minimal 
number of relations in the test database. Tne parameters 
G@iscussed below are ‘those ‘of the relations produced in sup- 
port of the benchmarking. 
1. Standard Templates 
All of the relations are characterized by the same 
general template. This template iS Shown in Table 2. Four 
specific templates are derived from the general one. These 
fevoiates. correspond to the four tuple léngetas used far 
testine (i.e., 100 bytes, 2U@ bytes, 1804 bytes and 2¢k¥ 
bytes). H2eh template is used te senerate the relations of 
various sizes (5¢2@ - 12,000 tuples). Tnus most of tne tests 
can be run on many relations by cnanging only tne relation 


name (or tne values of tne range variable) in the queries. 
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Key —- a sequential number to be stored as an in- 
teger tield 


Mirror - a sequential numvoer (same as xev) to ce 
Stored aS a Character strine 


Random - a random number to be stored as an integer 
fiela 
b4 
Raniom Uniuue —- a unique random number to be stored as an 


integer field 


Collated “=a cnaracter string to. be stored in: alpaenetic or= 
der 
x 
Letter - @ random alpnabetical letter 
# 
Sets = pblocKts: of. values, J7Tomvalue=set tiles. 


*= not used in some templates 


# multiple attributes depending on tne tuple length 
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The relations are designed to provide flexipility in 
testing. Ideally the tests to be performed are xnown tefore 
desiegnine the relations. however, the results frem some of 
THe Tests may. suegest. a need for additionelds tests whicnh nave 
not been previously considered. Accordinely, the relations 
are designed to allow the deSign of additional tests without 


Penerating. nore. relations. 


C. THES GENERATING PROCEDURE 

The eenereting procedure consists of three phases. yal ole) 
first phase consists of designing experiments and the rela- 
Deoas to be used in those experiments. Arter the relations 
Rave dDeen designed, they must be created and transnorted to 
tne testing environment. 

Generating relations is a simple process. Pirst TGe (is 
used to generate any necessary value-set files. Then, RG is 
used to generate relations. RG nas been expanded to produce 
a description file. Tais file contains the attribute names 
eno Characteristics of the attripure veilwes in the relation. 
The description lists both the format of tne generated file 
fae ne format of the relation as it is to ve stored in the 
database. 

1. Ine Generator System 

Tne generator system consists ot two najor programs, 


the Relation Generator (8G) ang the Value-set Generator 
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ee Otner programs and dedugging aids may pe necessary, 
depending on the environment(s) in which the system is 
implemented. 

a. Tne Relation Generator (RG) 

RG creates a relation file based on input from 
the user. It consists of four types of modules: the main 
eroeram, the main generator modude, tne individual generator 
modules, and tne collating module. 

The Main Module —- The main RG module ‘contains 
very simple logic. RG prompts tne user for the cnaracteri- 
stics of the relation being generated. First; the “name ‘and 
size (in tuples) of tne relation is requested. Tnen, tne 
wser is asked to determine the characteristics of the ‘first 
attribute. THe <attribute chareacterictics “ere “collected in 
an attribute record (See Table S$). After tne module obtains 
Dee enecessary attribute characteristics, it invo“xes tne main 
generator module. 

The main generator module, as explned inthe next 
section produces linked lists of attribute values and re- 
turns to tne main RG module. RG then invokes the collate 
module which is detailed in the sequel. The colléte rodule 
produces tuples by concatenating sets of attrioute values. 
After tne relation has been generated, tne user is given the 
Option of eenerating another relation or ending the process. 

Tne Main Generator Module —- The main generator 


moaule as invoked to produce Gach set of attrivtute watues. 
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Table. 3 The Scnema of an Attribute Record 


Attribute Name =~ .assiened attritute name 

Attribute Type - data type or attribute values 

String Lengtn - used ,for,s£Lrine.ty pes 

Lower Bound - first sequential integer and lo- 
wer bound for random integers 

Upper Bound - upper bound on random integers 

Generate Mode - data~type.distri bution 

Value Set Name - value-set tile name 

Relative Proportions -.discrete. distribution specifi- 
cation 

Seed - random integers 
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The characteristics of an attribute are passed to tne moarie 
in an. attribute record. Using this record, tne main module 
invokes one of several individual generator modules, depeén- 
ding on tne characteristics of the attrioute. The 
BoaszvVidual generator module produces a linked list of attri- 
mite. Values with tne aesired type ana distritutrion, . and 
returns tne list to tne main generator module. The main 
eenetator module opens a sequential file, writes the attri- 
Mite values into the file, closes the file, and returns to 
the main RG module. There are tnerefore several Sucn tiles, 
known as attrioute files. 

Codgbate. Module =< The collate module acts «4s 4 
eollator. Lt physically concatenates strings, of attrioute 
yatnes to form 2a tuple. ,. It 18S invoked to assimilate ail the 
attribute values in the attribute files into a file of the 
relation. Intormmation describing sthe attributes ois — passed 
fousuee.. collator as an array Of attrivute iresords:. The 
eoutlatorm rirst opens the reiation fise. and all the attri- 
pbuse files. The relation iss generated @ tuple ay .e time. 
One attribute value trom 2ach file is read. The values are 
Goncatenated to produce a tuple. TRe  TUplLe 15 The writ ten 
ho st0e .felation file. The .collator repeats this process 
Until all tne tuples have been produced. 

b. The Value-set Generator (VG) 
The Value-set Generator (VG) is a simple utility 


or Setting ap value—set files for RG. VG asks. for, the name 
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and size (i.e., tne number of values) of tne valve-<set file 
to be created. The values are entered individually and 
stored as strings in a4 random—access file for use by RG. 
2. The Conversion Problem 

Converting the program to run in tne batch envi- 
ronment involves several tasks. Tnese are tne conversion of 
interactive programs to batch programs, the submission or 
gobs to tne batch system, and development of tne additional 
Statements required to use of the batch file System. A= 
tnougn tne programs nad already beé€n debugged in tne YM/CMS 
environment, extensive debuezine has been necesSary after 
conversion to MVS. 

Conversion * cof “progerans*from VM/CMS<to>-MYS ‘ds 
not a simple process. A virtual. card deck is-created in a 
VM/CMS file which contains the source deck, tne input data 
and the file data required py tne MVS system. This tile is 
submitted to tne batcn aueue. The tapnt ror Re (i.e. ; “tee 
user’s replies) are in tne cari deck with the proerar. 

Rlthewen = "LG 2S *notenecessaery, tre *source “code 
whicn generated the instructions to tne user for tne input 
has peen removed for tne MVS versions. Tne VM/CMS version 
has been modified to create a file wnicn contains tne user’s 
responses to tne program’s prompts. 

Differences onetween the bdatcn and interactive 


Systems caused the difficulty in program conversion. THe 
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batch systen, MVS, requires much more in the way cf file 
Parameter specifications, and is much less forgiving when 
eeror conditions exist. Tnere are some rror coerdi tions 
wnicn tne user can not foresee. For example, tne system may 
Pnitially allocate space for a relation file on a volume 
wnicn does not have enough free space to cover secondary 
allocations. When tnis happens the program is aborted. 
However, it is not possible for the user to specity 4 parti- 
Gular disk {i.e., “one with sufficient space) for file 
storage. For the two largest relation files (fifteen and 
twenty megadytes), it nas been necessary to write sacn of 
the relations into two separate files on the batch system. 
The two tiles were then combined wnen tney loaded into tae 
database. 
5. Transporting the Relations to the Testbed 
a. Transporting the Data to the Host 

The transportation of tae relations: to tne ost 
2S a two=sSt2p process. The £irst- step Ss the transtar “oF. 
tne relation files fron tne MVS secondary storage to tape. 
Pesystem utility is used to accomplish taAis. The tapes arse 
tnen transported to tne nost, tne UNIVAC 1100, and a similar 


Utility prorram:'is used to load data into the host secondary 


ty 


storage. The nost utility program translates the BCDIC 
vape: fites into ASCII disx fites. 
b. Loading Data Into the BacKxend 


gne relations are toaded into the backend using 
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aevencor—supplied utility Called a translator. This wtility 
Geanmptes the.user for information, about the: source file, the 
target database, and tne target relation 

The translator utility may be run interactively 
or with file input. The database into wnich tne relation is 
to be loaded must already exist. The relation inte wiich 
Gata is loaded may or may not exist. Database name, nost 
Tile name, and relation name must be Supplied. Additional- 
iy7, for-cCach attribute the attripute name, length of source 
(in ASCII characters), and type of value to be stored in the 


database must be supplied. 
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V. GENERATING TEST PROGRAMS 


A. THE TEST PLAN 

Tae yescneral test plan calls for several different tynes 
of 2xperiments. Among these are experiments invoiving only 
one relation (i.e., selections and projections) and experi- 
ments involving more than one database (i.e., joins). 

1. Experinents Involving 2 Single Relation 

The selection and projection experiments are de- 
signed to measure tne system’s performance in retrieving 
gate, .tfom.a single relation. The response times measured 
are the sum of four variables: the time to process a query, 
Teearime to.access tne data, the tine to process the data, 
and the time to return the data. The time to process the 
guery..is.defined as tne time to parse tne query. BY .care= 
Pully constructing sets of experinents, tnese vaerianles:- can 
be estimated. 

Since..the. time to process a query is so small,. it 
may be ignored or combined with overnead for most experi- 
ments. hor .experiments where’ it 18 sienificant, tHe 
guery-proccessing time is minimized to prevent it from domi- 
natane tne tine .measurenent, resulting in a- 165s. “oF 
precision. The RDM 11¢@@ allows the parse tree of a query to 
Se stored in the database. This capavility allows the 


replacement of the processing time, which is dependent on 
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tne nost, witn tne data access tine, wnicn is depencent only 
On the backend. The additional data access time is the time 


to access tne command in storage. Tnis is the s 
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all stored commands. 

Tne largest variables are tne easiest to measure 
with precision. Theretore, they are measured tirst and then 
eliminated to measure the smaller variables. 

The largest variables are likely to be those repre- 
senting the tim® to access, process and return data. These 
ean De measured with simple retrieve commands. A time 
measurement of a retrieve which returns all the attrioute 
values of the tuples in a relation includes the times of all 
or the four variables. However, a time measurement using an 
agerepate function (e.2., count, which returns a single 
count of tne tuples meeting tne qualifications of tne avery) 
GCliminates the time to return the data. THUS: PALS Yunctisn 
can be used effectively to measure the time to access and 
process the data (tuples), i.e., two of tne four variables. 

Further, an assumption is made that for simple von= 
mands the processor can process data at a rate which is 
faster “than. -the -reate that data can be trouent into. tne 
memory for processing. This allows the processing time to 
be ignored. Therefore, tne measurements reduce to a measure 


of the access time. 
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Having quantified tne larger variables, tne time to 
process data may be investigated. It nas been assumec tnat 
the processing time is not Ssienificant ror simple commands. 
However, if the commands are made more complex, tnen the 
processing time is expected to increase. Witn. a sufficient- 
ly complex command which involves a small data-access time, 
fac. 6pmreccessing time may become significant. Therefore, 
experiments are conducted which minimize data access put 
wary in complerity. It is of interest to determire when or 
if the processing time becomes measureadle and Significant. 

It is expected that projections operations will 
encrease tne processing time. Therefore, several Tests are 
Bpordpriate. for testine projections. The first Set of.tests 
measure the effect of projections on the processing time. 
me. second. set checks to:-see if the processing time is 
affected ny tne type(s) of attrinute vaiues projected (i.e., 
antecer, strine). Tne tnird set of tests measures tne 
mperftormanee of a projection on all of the attributes versus 
a Simple ‘retrieve all’ command. 

After tne time basic variables nave been estimated, 
Onnmer pertormance factors are investigated. THe use on 
indices can reduce access time. By reducing the amourt of 
geta. brovent-into the memory, the.processime time. is> aise 
reduced. However, the processing time will be increased due 
ToOvtmmex access and searcn. Therefors, 10r same; redataoms, 


tne use of indices may increase the response time. Indexing 
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requires a specific set of tests to measure its vertormancze 
im various situations. The wse of different types of “ir= 
dices (i.e., clustered, non-cluStered, multiple xeys, etc.) 
Must also be investigated. An ‘expected factor in index 
Performance is the ratio of the index size (in blocks of 
Storage) to that of tne relation. 

String compression (removal ot trailing spaces) is a 
factor which can affect the processing time, the access time 
and the return time. The use of Compression can reduce 
block storage dramatically. THis, in turn, Treduces: “the 
Bccess time. However, it may require more time to process 2 
compressed string versus a non-compressed one, it processing 
reguirés expansion of the compressec attribute, If expan- 
sion is not required for processing, then tne nost may have 
to expand it for proper formatting. How expensive fin time) 
Ss this? Does this compensate for the reduction in thse 
responnse time resulting trom returning a smaller amount of 
data (tne compressed string) to the nost?. 

Other performance factors may be examined eltner 
individually or within other test procedures. An example or 
tfis 15 "the use ot different. tyves of attributes (i.e%, 
integer versus string). A complete series of tests can 0»e 
devaioned to test this issue in detail. However, it is also 
appropriate to investigate this area in conjunction with 


processing time and projections. 
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Operations involving more tnan one relation (fi.e., 
joins) are aftected by the Same time variables as those 
Miroisi na, only a single relation. Initial testing snould 
involve only two relations. 

It is expected that the access time will become 
dominant for join operations. This iS DecausSe the Same data 
May have to be accessed repeatedly. Memory size nas an 
Bocecy.on £he amount .of accessing required in ajoin opere— 
tion. fe ymemory, size, iS. -larzse. enoush to .alliow-. botn 
relations to be accessed once and left in tne memory, then 
the processing time may become Significant. In tnis circunm- 
Stance both tne access time and tne vrocessing time are 
expected to increase proportionelly to tne relation size. 


The unknown factor is the rate at which the processing time 


increases. However, it may be that neitner relation is 
epost enonen to be neld in the memory for processine. In 
tnis case much accessing must be performed. It may also be 
of interest to examine join performance between tnese two 


extremes. 

The join. should be designed to take. advantage of any 
Size differential between the two relations. If the smaller 
relation can be completely neld in thee memory, then it can 
be accessed once and brovent into tne memory. The larger 
relation can also be accessed just once as it iS brought 


mito. the Venory as a stream. It; On the ormer nand,. tre 
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farget relation is browuzht into the memory, it must ete 
brought into the memory a portion at a time. The smaller 
relation may nave to be reaccessed for each portion ot tne 
Pereer relation. 

It is important to examine the performance of joins 
Botn With and without selection. In performing tnese tests, 
the strategy of the operations should be examined carerully. 
The selection should be performed before the actual join 
Operation to minimize the volume of date being joined. 

eoetner “area cf interest 1S the’ “efrect cf ‘ince 
usage on joins. Perrormance here 1s expected to improve ds 
indicated by tne single relation index experiments. Fowever 
tne specific results may suggest tne efficiency witn wnicn 
the join operation has been implemented. 

if. inecuality joius heave.been implemented, pertor- 
mance testing snould be conducted using tnem. If they have 
not been inplenented, it may be valuabie to know if, and 
with what ditrficulty, they can be simulated. 

Having experimented tne join operations involving 
two relations, experiments operations snould be conducted 
using larger nunpers of relations in one join operation. Ey 
investigating the performance on Rultiple join relations, it 
May be possible to isolate a fixed overhead for all the 


mitial Joins. 
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A general test plan snould be developed tefocre any 

or tne experiments are designed. Pt whould ope *f bewr etre tate 

enable testing to follow different patns of discovery. it 

25 Sxpected that the results of some experiments may suggest 

other experiments. Time must be alloted for tne expansion 
er any test set. 

Fowever, it must also ensure that tne a sutrticient 

faree of data is obtained. The tests must cover tne univer= 

sal operations (i.e., tnose expected of any DEMS). Among 


the universal operations, Known bottlenecks and breakpoints 


‘Dd 


Bre Specifically tested. Lt should also investigate “amy, 
Bpecific strengths, weaknesses or idiosyncrasies of the 


DBMS. 


B. MEASUREMENT TOOLS 

THe response-tine measurements in these experiments were 
taken from the backend-macnine clock. THIS Clive ofastha 
Fesolution “of *1760:.second and®anbaccuracy within! 1/S0=tatorf 


a second. The response time of the backend macnine on small 


relations is dominated by communications overnead. The 
Nininum response time is apout one>second. “$9, of tHE tests 
conducted, tne 1/58-second interval is sufficiently 


accuraté. 
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However, if tne overnead can pe reduced, @ more precise 
measuring device is required. Most maintrame operatine 
Systems provide a-clock with acresolution* in. microseconds. 


This is not available in tne backend macnine. 


C. QUERY SCRIPTS VERSUS PROGRAMS 

Two methods exist tor pertormineg benchmark experiments. 
These methods involve the use of query scripts anc programs. 
The first of tnese simulates an interactive session acces- 
sing tne datanase. Toe actual terminal input 1S prepared 
anead of time and stored in a “run-stream’” file, «known as a 
query script. The host operating system can te instructed 
to obtain its input tron a tile instead of via tne terminal. 
Thus a series of tests can de collected togetner in 4a 
Script: Additionally:- +tne -output. can be redirected tor a 
file, renevineg the .overnead sin communicating witar a 
terminal. 

Tne use of SPatcn programs involves much more of the 
programmer’s time in the development and debugging of the 
program. Development of bdatcn programs also represent 4a 
larger drain on tne nost’s resources. Tnis factor could 
Severely atrfect testing at many installations. 

Since queries must re interpreted wnetner tney come from 


an ybatehea Job Graa Seripty «the useuof. baten apnoagramming erdid 


not offer the advantages of bypassing the query processo 


ar J 


Therefore ys ot nere vis rsamesquestion (whether sor moetsia. daten 
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program would provide superior pertormance results. THis 
Question and tne ease of development of query scripts 
Suezest that the use of query Scripts is the desired metnod. 
If batch programming offers a significant performance im- 
provement, additional testing must be performed using tatch 
noDS. Here it would be wiSe to run a complete battery of 
fests in the interactive environment, followed by a. sunset 
Of these tests in the batch anvironment. This subset should 
be designed to test areas where the bdatcn process may have 


Its most impact (i.e., tne data return time). 


D. INTERPRETING THE DATA 

Vee interpretation of data is a very important part of 
the testing pnase. There are two reasons for this. First, 
conclusions cannot be drawn from raw data. second, Timely 
interpretation enables the persons conducting the experi- 
ments to analyze tne results and identify further testing. 

a  coldection. of raw data is very ferd to interpret. 
Therefore, any results obtained Should de graphed immediate- 
Lys Graphing the results immediately allows rapid 
Heeneiticarion of .@rrons and unexpected results. Related 
results should aiso be eraphed toeether. For example, all 
the results from a query applied to relations of different 
tuple length and relation size snould be grapned togetner. 

Once tne raw data is analyzed, tne graphs may be re- 


fined. Tne grapn axes may be varied as appropriate. For 
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exanple, tne response tine may be grapned against tne tuple 
lengtn, against tne relation size (in tuples or the numeer 
of blocks of tne storage space occupied) and against tne 


Quantity of data returned to tne user. 
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VI. CONCLUSIONS 


A. RESULTS 
Tne results obtained from testing several contigevrations 
of a relational databas® machine have provided a basis for 
meveltoping a general set of benchmark tests for relational 
database machines. Tne pencnmarking tests have peen mostly 
macnine independent. Altrougn a testing methodology is 
provided herein with enough results on certain contfigura- 
p1005, additional testing is necessary. This testine should 
be performed on otner DEMSs, preferably with different cnha- 
Facteristics, to ensure that the test is complete and not 
machine=-specific. Tne resnits of testing s€Lection and 
projection operations are descrited in [4]. Resuits from 
performing tests on join operations are described in Sik 
j.  eeneral Results 

The response time naS been sh wn to be proportioral 
fo the time required tio access the data. This, in turn, hes 
pean shown to be proportional to physical size of the data= 
base. Metno@s usé@d- to reduce the amount of Gata to “be 
brougnt into the nenory tor processing (sucn as indexing ana 
string compression) improve tne response time. 

The response time is also proportional to tre amount 
it Sata returned to the user. fn the case of the EDM 1290. 


the time required to return tne Java. 2s tne largest 
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component of tne total response tine. If tne necessary 
information is obtained via aggregate runctions, tne te- 
sponse time is greatly improved. It ts not possible te 
determine now much of the response time is due to the pack- 
end machine and how much is due to tne nost. Eowever, 
loading tne nost definitely degrades tne response time. An 
analysis of the response time under various load conditions 
mm tne Host may Lead to a distinetion of tne host resporse 
time vs. tne bacxend response time. 

Tne tine required to process queries and the time 
Pequired to process data in the memory are relatively smail 
for the RDM 11d. This may not be true for other systems. 
Therefore, it is imperative tmat these areas be carefully 
examined when adapting the proposed tests to systems witn 
aifrerent arenitectures. 

The results of the experiments Show that DEMSs do 
have characteristics wnicn may ob&®& measured. A well=con= 
ceived series of tests can measure an installation ’s 
performance, and gain an indication of its performance and 
its “personality.” Tnese tests can be used to compare D3MSs 
azainst each other. For the DEMS Impiiementor, the tests 
ais sopro Vide .<a method ,of padeteraiming * =poorly ¢ cinplemented 


parts ot tne system. 
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@e Research Results 

The. experiments. wnhicn .mave. peen performed nave 
Supported two different types of stuay. TAG .hiPstuls “Lae 
actual measurenent of tne backend macnine’s pertormance 
(albeit, with light load and few configurations). The 2DM 
11¢@6 provides a conprenensive (altnougn uncomplete) relea- 
tional model which successfully otfloads DEMS tasks from tne 
nost. Since evaluation of tne macnine was conducted simul- 
taneously with the research, the task ot evaluating it has 
been accomplished. Some ar@€as tnat nave .not. been fully 
favestieated are due. to tne lack of. time. Other areas that 
have not been fully investogated are due to incomplete 
implementation. As an example ot tnese areas, tne use ot ALL 
Maserretrievess is, contigent upon tne number of attrivutes. 
At one point, the use of ALL on a relation with a large 
number of attributes results in only an error messags. After 
Pmsteddation of thaes accelerator... tne, use, of. ALL. nalts. ne 
command. After. tne accelerator is removed, tne provlen lor 
Baltinesepersists..«, Another .defieiency,noted.has been »tpe 


ipebidity: to perform an inequadity join. 


B. A RELATIONAL BENCHMARKING METHODOLOGY 

rae propesed se€t‘of benchmark tests nes Tour prases. 
The first phase consists of preliminary tests designed to 
identify tne best netnod of measuring tne system’s response 


time. Tne second pnase involves isolating tne different 
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components of the response tims. Tne third phase investi- 
sates the system response in specific areas. The. fourta 
phases verifies tne results optained during the phases two 
and three. 

1. Phase One - Measurement Methods 

Most systems have at least one mechanism which pro- 
vides a time measurement. Initial testing is designed to 
identify the one which optimize tne precision obtained ver- 
Sus the @ase of obtaining that time. Once the measurement 
metmod tas been chosen, it is checked to ensure that it is 
accurate enoush to provide the necessary precision. [yrs 
also necessary to ensure that the overnead involved in 
retrievine the time does not reduce the precision of the 
measurements oeing taken. 

If the necessary precision is not readily availatle, 
then techniques are available to increase tne precision of 
the results. These techniques involve performing an opera— 
tion several times and calculating an average. The. “tecH= 
hiques selected must be reviewed for side effects. Tne DEMS 
may have tne capability of internally optimzing performarce. 
for example, tne order in which tne queries are submitted to 
the DBMS may allow tne DEMS cache memory management to 
recguce disk access. 

In the case of the RDM 1100, two different methods 
of measuring time could nave peen used. The first metnod is 


to, ,OUtdin @ time stamp from the. host operating system. 
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Altnougn it may nave provided sufficient precision, it nas 
not been investigated because of the otner metnods 
available. The second metnod is a time stamp available from 
ane IDM. A buiit-in function supplies an elapsed time 
Measurement intervals of one-Sixtieth of a second. This 
provides sufficient precision for tne measurements. Since 
tne Glapsead time is a sufficient measurement, the more 
precise measurement has not been usec. 
e. Pnase Pwo = Component Isolation 
Once an adequate method for measuring time has peen 
weritied, it is used to measure the performance in several 
Specific areas. Thes2 areas are the four components which 
ere involved in eail queries: tne time of process the ‘query 
(i.e@., parse it), the time to access the data in the data- 
base, the time to precess the data in the memory, and tne 
none to. TeLura the requested. These components mey ove 
considered the DBMS’S primitive operations. These primi- 
tives do not take advantage of any metnods used to improve 


the response tine of a given query. They merely measure tne 


performance of tne nardware and software in performing spe- 
cific functions. It nas been stated that a performance 


measurement of some aspects of a DBMS is really a measure- 
ment of the operating system. The operating system does 
SrTect DEMS response. Eowever, in tae case of a Dackend 


machine, this effect iS minimal for some operations. While 
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mts issue’ may ve debated, It 1S not of interest to the 
user. The user is not interested in the reasons why a 
system responds poorly. He is interested only in the fact 
mat a system performs properly and® the fact thet the sy= 
stem’s performance is better (or worse) than that of anotner 
system. He is most interested in tne possibility of optai- 
ning a quicker response time on nis application. 

PHe o system  primitivesroareemeasnured by7e al set * oF 
g¥erses owhich isolate different aspects of the’ response 
time. One set of aueries is designed to return tne same 
amourt of data from relations with the same numper of tu- 
Pies, DOUtehnavine differents tuples sizes. seOncerastuplesis in 
the memory, it takes the Same amount of time to project one 
Spel pute, i. from: sa set of L00=vyteotuples”"as fromea set” -of 
20@G—-byte tuples. The difference in the response time for 
tReet etworqueries isi due only tothe time necessaty to- tring 
the ptuplevinto the memory: The’ times> required -to-- process 
tHE  ogueTyY, “to process the datacand to return tne data are 
the same. 

Pressecond .set.of queries *isedestenedi rotmedsure tae 
time required to return the data to the user. These queries 
return a different amount of data (in bytes) from prejection 
Operations on tne same number of attributes in tne same 


format (fhe., orstringes, © letes) lim relations whieh are of “tre 
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same pnysical size. Tnese restrictions assure tnat tne 
access time is the same, the processing time is the same, 
Bia tHe query processing time is tne same. 

The third set of yueries is designed to isolate data 
processing time. In tnis set, tne yueries return tne same 
amount ot data from relations of the same physical size 
(i.e., identical storage requirements) bsut naving a aiffer- 
ent number ofr tuples. This provides a measurement of the 
processing required relative to tne number of tuples pro- 
cessed. The query processing time, the data access time, 
and the data return time are the same. 

The -fourth set of queries provides a measurement of 
query processing time. For operations on relations oft any 
Sieniticant size, this. is Rard to measure. Even on small 
relaticus, it may not be Significant compared to simple 
System overnead. This s@€t of queries is more complex thar 
the provious sets. The queries are constructed to allow the 
effects of tne tine eleménts (1.e., tne three just measured} 
To be subtracted from the measurements, leaving only the 
query processing time. Considering the difficulty in ob= 
taining a precise neasurement of the query processing time, 
tt May not be worthwoile to determine this valine because of 
Hts Small size. 

The previous daiscussion indicates that tne query 


sets 


py 


re independent. However, with proper planning the 


query sets nay be combined with equivalent results. In tne 
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erapnhm shown in Figure 3, one set of experiments provides a 
measurement of data access times and data return times. The 


‘ 


set also isolates the constamt query overhead .whicn 
includes the auery processing time). 

Figure 3 represents tne response time of two que- 
ries. One query selects tive percent of tne tunvles ani 


returns all of the attribute fields from Gach «tuple. The 


of the tuples. The queries are doth run against relations 
with 1¢¢—byte tuples. The relations vary in size trom 5¢¢ 
tuples to 10,000 tuples. Point A on tre grapn represents 
the tive percent selection on 14,2¢9 tuples. | Point, 5 repre- 
sents tne ten percert selection on 500 tuples. Since each 
er tnese queries returns 582 tuples, tee time to return poe 
data is the same. The overnead associated with eacn query, 
including query processing time, is the same. Tnerefore, 
the difference obetween tne response times represented ody 
Points A and B is tne difference is tne access time and the 
processine time of the queries. Polaut A FepResents a yre= 
trieve on cb? ,609—-tuples,-~ weich-—is-S¢0-—bloces” “oT dTtSsx 
Storage. Point B represents a retrieve on S5¢@@ tuples, or 
wou disk blocks. Assuming that processing time for. tasse 
queries 15 insignificant relative to-ctae access time, fere, 
the ditference in tne two response times is the time to 


access 25@ disk olocks. 


Rha ii 


F sents. ae e 
yee , ~ as 


eve yreie aoetes aer 

r 

eer ei lezsposg = 
7 


a] ows te omits, Sta ioqee? acy sebataatcn a 


° 


7 


i ae 


a> 
jew eetser. SAI 1) 10s" 729 evi? aresiee eu ger a 


7 | ) 4 Let’? i reel inaae Sia te vie 
eI aS is * et a é - 7 4 is 
tc ’ aia : . ; — oo] 


ae 


} 


get eg psy 2pseiee yi) nen? 1eesee tae tease, en ersup 
; ; ad's) CGR 
_% 


' eeorrele: en eu rod ete 2 eit 299 oe? oamuqut sag. 2 


= ie mo7 effe a eTey enotisiet oat eee] ‘que srqenvar arr 


-#91g 87 S88 SL or 2 qi 
wither 4 inset Seiqua Oss, Ul ee ee wii af 


a " ’ 
a sees Soars -2sigae S90 go no leoeiea A807 aq NSt. ou baba 


.* 


“Wat @uvFs+ oF amis sic ,2eing7 Oe armrest renyeu, seeaa 


Ls 


Syasan acte> noiw renet-sees pesnvers Sh © samee ant el es set 
+2707 83207 ‘ame? Sar et ,amas eategeroty VISUE - asibette 


"2 bepeeserget domti sernq2et sng nsswnes  Semetet yb 
‘ rt 


. oy? 626 Ont? -22¢37 cs i saevedaae $97 21 € tee A 2dr 
= 2 F 

} “37 Wis2eeust A inlod' .esitemp- sty %6 sehr +cleepot 

azic ewait! ie : sr peg seetiuuy 8Gs.92 oo ¢# 
: rf 

=¢ 284 QS oF oe & gitiezge1get © Foi 05 ~2ast 

' iy 
Sete? ms2 see2 | THO? NetegeeA «|. PRGC 222d 
’ adil seso78 357 SviJAL et gaddatitteniest +i 28 
S AC o oF Sw eny g rete trib 


i : 
oe Bee L6 1& NOS 2eg008 


v2 


VLV@ GINYNLIY 40 AZIS “SA JWIL JSNOdS3Y - & 3NNII4 


Ppausuniay seal dn, 
O00T 006 008 002 009 00S 00” OO€ 002 o0oT 0 
pesyspAQ’ ATong+Ue 18 Boag Pe: (St aie Fil ane 


: H s 
H 1 ; Hy 

ca 
i ee amet - alee 


ae emer ewes sere 
ween ret ee cer wees: 


- awerenes oe 


| setdny 024q-001; 
ONS uInIaY OF out y | 


eouduemenceensccaseemeh os ceee 


| , | 
i i H iD ; 
eee, Oa ee! eS ek cm pee ne ee ae se ed oh we : 
| i i i i i i 
| 
; H H i | | ; 
i i i : H i i 
SYOOTd XSTC! | | 
ey emer bese ee ge nese sone eepah ne newenseemer: amen de eonemes as reneraneennnnees ‘5 apn rnnnste mene tema nenere bee treme cmennen ene d= mennreee +3 = = 
0&¢ SSOSSV OF OUT LL | | 1 i 
ee. 2 ee tte IE Ele a 
| ! | ; : ' 
besos Sines i ceed ode ¢ APES os ae can oe oe ae ch Prec Eee t {hee artes ot 
H H } ; | H i ! j H 
| i i | 
i H i i H H i - ; 
aa eR Se ee | gee oe a ate ae i Ot 8 ee See Geer ee 
: : 
: i I i H i 
i i i i : i 
: | | 
H j H | ; f 
| | | : | | 
ee Jrrveceseseeenecees fren eeeee ~4-—---- =~ + Seerass senefecnne meee ; mencceeeehemee eed 
; : : i ! : 
i i i | | i 
| | | 
| 
| I i ! 


ee er eet ee eee ett ii iaet tt tererte fet renin oot rire errors 


Sar any a1h9 ool 
Jo yesaladsy yyy Ay 
B1e9 PaUuN{saYy JO AZ4S “SA Buy), asuodsay 


$pu0Ias5 


60 


e760 Serves 


Seals 7 2 
d noreal 29 -omt Ssmnoderh 3 


oh OF yI5poK £ 
astaut a A 
aie ae 


i. 
. 
a9 
i: 
4 


} 
{ 
ng 
{ 


- 
i 
| 
a oe 
j 
} 


nee 
Te ee ee a a 
’ 


_ 


- I Diet 


» AGA OF omit T 
‘ma ford 


on od. ont TL =4 
T otdd-00! : 


a. ee 
ie al ad ay Vea 


oot ect 


an 7 


‘, 
me, 


The overnead tor all the queries shown on tre erapn 
is the same and is represented by thé common intercept of 
foe, Vertical axis. It the time represented by Point F 15 
adjusted for the overhead and tne time to access 254 blocks, 
chen the result is the time to retutn 58¢ 1¢d—-pyte tuples. 
Theretore, the use of one query Set nas identified rates for 
accessing data (in blocks per second) and returning iata (in 
bytes per second) 

5. Phase Inree 
arter tne time elements Aave been measured, a set of 

queries are performed which measure the effect of methoas 
usei to improve the system response. An @xemple of this 2s 
tne use of inderes. Theoretically, tae wse of indexes 
should improve system performance Dy decreasing the amount 
of data accessed. However, tne index nust pe accessea and 
processed. Areas of j[nterest here involve determining at 
what point, if any, does the use of indexes become impor- 
tant. Therefore, performance on indexed relations is 
measured over a wide range, What type ef index (i.e., clu=- 
stered or non-clustered) provides tne vest pertormance and 
what are the trade-offs? Woat scope of indices (i.e., one 
attribute, two, or more ) provides tne test pertormance? 
The latter question may be one dependent on the application. 
intestine the ADM 11¢0, it nas bpeen noted that, if the 
index is defined when the relation is being created, then 


the size of a relation witn a clustered index is larger then 
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ine Size of the same relation if the index is defined. after 
the data has been entered into tne relation. Taps 52s" .bem 
caus@€ tne loading algoritnom assumes a normal distritution of 
key values, while the data is in key seyuence. data loaded 
Bas b€€n gfenerated already sorted. 

Additional testing snould be pertormed to get a 
“feel” of tne system. By p@cominge familiar witn tne sy- 
stem’s capabilities, tne testing personnel snould be atle 19 
determine interesting lines of experimentation. Areas of 
Enpterest inciude tbe overkead associated with projection 
enerations, the use of string compression techniques, and 
tne efficiency of join operations (in different évailapbile 
memory configurations, when available). 

4. Pnase Four — Verification 

The last phase takes place after the other tests 
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tests Should provide some meaningful results about system 
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Pueayion phase serves to perform tests which verity © or 
disprove tne analysis of the previous tests. Lt. 2kS50 > pror 
ates an CDPOrtunity to redo any tests which appear errone— 
ous or suspicious. In this phase, additional tests may take 
gavantece of tne flexibility designed into the synthetic 
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C. SUMMARY 

Investigation otf the pertormance of several contieura- 
tions of a backénd relational database macnine nas provided 
considerable insignt into what may be a sound obasis for 
general performance testing on relational DBMSs. In tads 
thesis, a methodology has been laid out and the initial 
phases to be taken in tnat métnodology nave been derined. A 
complete framework for suoseuuent phases nas not heen fully 
dev2loped, out their contents nave been discussed. Wnile 
toe tests described relate to a Specific sertes of relatiou— 
al database nacnines, tne basic metnodology may apply to 


relational databas® macnines. 


: e \a 
taste = ieee ienos : ana . 


_ 


ot. aah _ eet 2 sé yor Pea» orm dentin it suse abhi 


> 
a at “et er | semoaieaal oo erivesa seer ated Bsn: 


wer 


deine one re gue the | apea en cnoueronzen t Bot 


i 7 


7 e eaentiet eer Svob Lepeayar’ ren’ at ‘noah, boon ates 


. / ars: een por ‘yes #senma boeupaet ve 70% st 
c Seas 1 beeen eit ‘hee eves ets canitea fens t40. ne 


Tarte e.es. 16 server SS SLO8GE & 94 otsiet nec lisest 2 are 7 a 


“ EsgG*® TEM WHO. SonIST sles eng, Set clatnasaad sesgeis aoe 


es. 
_eomtonen eesdrras Lenott ret 
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As Databases ~- catalog ot databases in tne system 
ee Disks —- list of diSkS known to SyStem 
ie Loee.= used by IDM for concurrency control 


4, Configure —- information about serial and parall2l in- 
terraces, checkpoint interval 


Ds Detastat - information about current aetivity®in the 
IDM 


Database Tables 


i. Relation - catalog of all objects (relation, view, 
stored command) in tne database 


As Attribute - catalog of @acn attribute of e@ach relation 
De Indices - catalog of indices that exist in tne databddse 


4. Prenect ~ cataloge»ot protection intkermation im the. da= 
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Se Query ~- stored commands and view 
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